|
|
|
|
Suppose that you are putting together a hi-fi system. You have some big woofers to use for the bass and some tweeters for the mid-range and treble. How do you break up the music signal so that the low frequencies go to the woofers and the high frequencies go to the tweeters?
You could build high-pass and low-pass filters using resistors and capacitors. A capacitor consists of two metal plates separated by an insulator. Charge can build up on one side, but cannot jump across. Thus to direct current (DC), a capacitor presents infinite resistance. If the current alternates fast enough, then the charge on each side of the capacitor goes back and forth between + and – and the dancing electrons on one side make the holes on the other side dance the same way. Thus the alternating current (AC) behaves the same way on both sides; a capacitor has no resistance to fast AC.
A low pass filter drains the fast AC to ground, letting only the DC and slow AC get through. The slow AC is the low frequency signal for the bass.
A high pass filter only lets the fast AC get through. These high frequency components of the signal form the treble.
Suppose that we want to do the same thing using a DSP. We have an acoustic signal from a CD sampled 44,100 times per second. How do we break this up into high and low frequencies?
Any reasonable function h(t) that is periodic with period T can be written as a sum of sines and cosines.
How do we find the values of the coefficients an and bn? They come from the formula:
Suppose we have a 32.7 Hz symmetric triangle wave with minimum value 0 and maximum 1.
The period T = 1/32.7Hz = 0.0306 seconds.
Let’s assume that the wave has value 0 at time t = 0 seconds. The equation for the triangle wave is:
If n=0, this reduces to
If n¹0, use the formula
To find the coefficients for the sine terms, we need to solve (n= 1,2,3…)
We can save some work by noticing that the triangle wave h(t) is an even function: h(-t) = h(t) for any value of t. Sine is an odd function: sin(-t) = -sin(t). The product of an even function and an odd function is odd: h(-t)sin(-t) = -h(t)sin(t).
The values of bn are found by integrating h(t)sin(2pnt/T) over one full period. We can do that over any period, for example from –T/2 to T/2.
Making a change of variable s=-t gives
Using the fact that the function is odd:
Thus the Fourier series for the triangle wave is
[Triangle.xls]
The lowest C on the piano is 32.7 Hz. Let’s split the tone so that the frequencies below middle C (261.63 Hz) go to the bass speakers and the notes above middle C go to the treble speakers. Then the cosine waves for n= 1,3,5,7 are in the bass range and n=9,11,13… are in the treble range.
2-D real vectors and complex numbers are equivalent.
Points of the plane can be represented:
as rectangular coordinates (x, y)
as a complex number: x + i y where i = sqrt(-1)
as polar coordinates: (r, q) or r Ðq
If we have a frequency f at a phase f, it is natural to write it as f Ðf
When we convert this to rectangular coordinates, we get
(x, y) = (f cos f, f sin f)
z = x + i y = f cos f + i f sin f º f cis(f) º f eif
The complex exponential is the Euler identity.
r = sqrt( x2 + y2) magnitude or abs(z)
angle = atan2( y, x) argument of z arg(z)
Complex conjugate of z is z* = x - iy
Some computations are easier in rectangular coordinates:
z1 + z2 = (x1 + x2) + i(y1 + y2)
z1 - z2 = (x1 - x2) + i(y1 - y2)
Others are easier in polar coordinates:
z1 z2 = r1 r2 Ðq1+q2
z1 / z2 = r1 / r2 Ðq1-q2
zn = rn Ðnq
Electrical engineers use “i” to mean current. They often use “j” for the square root of –1.
Using Euler's identity, we can write
The coefficients cn come from the formula:
We have replaced the real coefficients an for cosine terms and bn for sine terms with the complex coefficient cn = an + j bn.
The mathematical derivation of this formula is based on vector spaces. The set of all periodic functions with period T can be considered a vector space. The set of complex exponential functions {exp(j2pnt/T)} is a basis for the vector space. In this vector space, the inner product of two periodic functions h(t) and g(t) is defined by
where the * indicates the complex conjugate.
You can verify that the basis set {exp(j2pnt/T)} is orthonormal:
<exp(j2pnt/T), exp(j2pnt/T)> = 1
<exp(j2pnt/T),
exp(j2pmt/T)> =
0 for n ¹ m
A vector in a vector space can be written as the sum of its projection [inner product] onto each element in the basis times that basis element.
but the definition of the inner product of h(t) and the exponential is exactly the formula for the coefficient cn.
Look at the definition of the complex coefficients of the Fourier series.
Instead of a continuous function h(t), consider a discrete function defined at t = t0+nDt, n=0, 1, … N-1
Then the integral in the formula for the coefficients is replaced by a sum. Since h(t) is only defined at N points, there are only N different values for the coefficients. These coefficients carry just as much information as the original signal. We can look at either the original time domain signal h(t) or the list of coefficients H(f) in the frequency domain.
where H(f) is defined at f = nDf, n=0, 1, … N-1
T = NDt is the period
of h(t).
Dt is the sampling rate.
f0 = 0 is DC. H(0) is the average value of the signal h(t).
f1 = 1/Dt is the fundamental frequency.
fs = NDf is the sampling frequency. This is the period of H(f).
ff = fs/2 is the folding frequency.
Values between N/2 and N can be viewed as negative frequency harmonics between ‑N/2 and 0.
n is the harmonic number for the frequency or the sample number for t.
The bandwidth of a signal is the width of the frequency range for which the Fourier spectrum is substantially non-zero.
A low-pass signal contains only low frequency components.
A high-pass signal has only high frequency.
A band-pass signal does not contain a zero frequency component.
Nyquist limit: The sampling rate must be short enough to detect twice the highest frequency of interest. Thus the highest frequency that can be resolved is the folding frequency, ff.
Thus if we want to be able to detect frequencies up to 20KHz, we need to sample at 40KHz. If we plan to use a 4096 point discrete Fourier transform,
fs = 40,000 = NDf = 4096 Df
Frequency resolution, Df = 9.77 Hz.
The sampling rate would be once every 250 microseconds.
Symmetries:
If h(t) is real, then Re(H(f)) is even and Im(H(f)) is odd. Re(H(f)) is symmetric about ff.
Given the following discrete functions:
t |
0 |
1 |
2 |
3 |
h(t) |
1 |
2 |
3 |
4 |
g(t) |
1 |
0 |
-2 |
-1 |
Assume that at all other points, the functions have value 0.
a) Compute the convolution h*g
|
h(n)g(t-n) at n= |
|
|||
t |
0 |
1 |
2 |
3 |
h(t)*g(t) |
-1 |
1(0) |
2(0) |
3(0) |
4(0) |
0 |
0 |
1(1) |
2(0) |
3(0) |
4(0) |
1 |
1 |
1(0) |
2(1) |
3(0) |
4(0) |
2 |
2 |
1(-2) |
2(0) |
3(1) |
4(0) |
1 |
3 |
1(-1) |
2(-2) |
3(0) |
4(1) |
-1 |
4 |
1(0) |
2(-1) |
3(-2) |
4(0) |
-8 |
5 |
1(0) |
2(0) |
3(-1) |
4(-2) |
-11 |
6 |
1(0) |
2(0) |
3(0) |
4(-1) |
-4 |
7 |
1(0) |
2(0) |
3(0) |
4(0) |
0 |
b) Compute the correlation h°g.
|
h*(n)g(t+n) at n= |
|
|||
t |
0 |
1 |
2 |
3 |
h(t)°g(t) |
-3 |
1(0) |
2(0) |
3(0) |
4(1) |
4 |
-2 |
1(0) |
2(0) |
3(1) |
4(0) |
3 |
-1 |
1(0) |
2(1) |
3(0) |
4(-2) |
-6 |
0 |
1(1) |
2(0) |
3(-2) |
4(-1) |
-9 |
1 |
1(0) |
2(-2) |
3(-1) |
4(0) |
-7 |
2 |
1(-2) |
2(-1) |
3(0) |
4(0) |
-4 |
3 |
1(-1) |
2(0) |
3(0) |
4(0) |
-1 |
4 |
1(0) |
2(0) |
3(0) |
4(0) |
0 |
The terms inside the brackets are 1-D Fourier transforms.
The computation is separable, which greatly reduces the computational effort.
1) Rectangle and sinc form a FT pair. See p. 83 of Gonzalez and Woods.
2) Delta and 1 are a FT pair. The constant function 1 can be considered the limit as a rectangle gets infinitely wide. The delta is the limit as the sinc gets infinitely small.
The rectangle and the delta can be considered different ways of sampling a continuous signal to get a discrete signal. Note that if the width of the rectangle is Dt, the corresponding sinc function is 1 at its central position and 0 at all the values that correspond to Dt. Figure 4-3 from Masters shows how the sinc function shows up as “leakage”. Fig 3.1, 3.2, 3.3 from Gonzalez and Woods show how the sinc function appears as 2D spectra.
3)
4)
5) Gaussian:
Translation: Magnitude of FT does not change when you translate the image.
Equations 3.3-6 and 3.3-7 of Gonzalez and Woods.
Rotation: When an image is rotated, its FT is rotated by the same angle.
Scaling: When the position variables are scaled, the frequency variables scale inversely.
Equation 3.3-17 of Gonzalez and Woods.
Average value of the image is found at the DC component of the FT.
Correlations and convolutions are computationally much faster using FFT than by a direct computation. If you want to convolve with a small mask (3x3 or 5x5) you may be able to do it with fast hardware as discussed earlier. If you want to convolve with a larger mash (11 x 11) it is faster to do a FFT of image and mask, multiply the two, and take the inverse Fourier transform.
If you assume that your signal is not periodic, you are working with the signal multiplied by a rectangle function. When you take the FT, the rectangle transforms to a sinc and introduces undesirable side-lobes that produce “leakage”. To reduce this, you should window your data with something that slopes off more gracefully than the “boxcar” filter. See Masters, pp.92-95.
Some popular windows:
Welch:
Hanning:
Hamming:
Suppose that we want to take an 8 point DFT of a function h(t). If we do it by brute force, we have
H(f) = Re(H(f)) + j Im(H(f))
f |
Re(H(f)) |
Im(H(f)) |
0 |
|
|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
6 |
|
|
7 |
|
|
If we do this naively, we would compute trigonometric terms 128 times, but there are only 8 different values for the cosine and 8 different values for the sine. There are fewer values than that if we make use of the symmetries of the trig functions.
[By the way, the 8 complex values of the trig functions are the 8th roots of 1].
Once we compute a table of the trig values that we need, a direct computation would need 8 multiply-add operations for each value for Re(H(f)) and 8 for Im(H(f)). This is 128 operations, or 2 N2 for computational order O(N2)
The trigonometric values that we need are:
f t |
q |
cos(q) |
sin(q) |
0 |
0 |
1 |
0 |
1 |
p/4 |
Ö2/2 |
Ö2/2 |
2 |
p/2 |
0 |
1 |
3 |
3p/4 |
-Ö2/2 |
Ö2/2 |
4 |
p |
-1 |
0 |
5 |
5p/4 |
-Ö2/2 |
-Ö2/2 |
6 |
3p/2 |
0 |
-1 |
7 |
7p/4 |
Ö2/2 |
-Ö2/2 |
where
A naïve computation of the DFT would take N2 operations. However, we can break the series in half, and compute the DFT of each separately, with each half taking ¼ the time. When we do this recursively, the time for the transform becomes N log(N). This is a huge computational savings and it makes the Fast Fourier Transform (FFT) practical.
See section 3.4.1 of Gonzalez and Woods.
Almost all computational Fourier transforms are done using the FFT instead of the direct method. The FFT works best when applied to a signal whose size is a power of 2. This is the main reason why research image sizes are commonly taken as 64x64, 128x128, 256x256, etc. If you want to take a FT of an image that is not a power of two, you can:
1) Embed the image in a larger image that is a power of 2.
2) Use some clever algorithms that can be applied to other numbers
3) Do a direct DFT (last resort)